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Abstract 

We derive a composite centrality measure for general 
weighted and directed complex networks, based on 
measure standardisation and invariant statistical 
inheritance. Different inheritance schemes generate 
different intermediate abstract measures providing 
additional information, while the overall composite 
centrality measure tends to the standard log-normal 
distribution. This offers a unified scale to measure 
node and edge centralities for complex evolving 
networks inside a uniform framework. Considering 
the two real-world cases of the world trade web and 
the world migration web, both during a time span 
of 40 years, we propose a standard framework and 
demonstrate its remarkable normative power and 
accuracy. We demonstrate the applicability of the 
proposed framework to large and arbitrary complex 
systems by the means of numerical simulations. 

Keywords, complex system, data mining, data anal- 
ysis methodology, weighted directed network, evolving 
network, unified scale, composite centrality, world trade 
web, world migration web 



1 Introduction 

Stalling with the work of Watts, Strogatz, Barabasi, Al- 
bert and Newman ITJ|2j[3l, the investigation of complex 
networks has attracted an inflationary amount of attention 
from numerous research fields due to their ubiquity in the 
real world |H[5j[6]|. One of the major fascinations lies in 
some elegant and efficient descriptions of very different 
complex systems under the general framework of modern 
graph theory EMU pioneered by Erdos (9J. 
As the awareness of several common dynamic characters 
of many real-world networks rises, ever more effort has 
been devoted to understanding the temporal evolution of 
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complex networks. Prominent examples of evolving net- 
works are the internet and social networks (51 , transporta- 
tion networks [ 10] and the world trade web lfTTl[T2ll . Con- 
sidering the description and analysis of evolving network 
structures, most of these efforts have been made regarding 
network modelling (TJ EE! US US [13, while the devel- 
opment of sophisticated analysis tools and methodologies 
has seen less progress. This yet will be the topic of this 
work. 

The characterisation and classification of general complex 
systems and especially complex evolving networks pose 
three major challenges: 

• Uniformity: There is a large variety of network mea- 
sures stretching over wide numerical ranges, but 
there is no standardised procedure today to consis- 
tently consider several measures simultaneously. 

• Variability: Observed over time, many complex net- 
works show growth (change of the number of nodes) 
and evolution (change of the topology). Network 
measures often depend explicitly on these quantities, 
which complicates a coherent temporal analysis. 

• Comparability: There is no unified scale on which 
one can compare results originating from different 
networks. 

Motivated by these problems, we propose herer a new 
centrality framework, called composite centrality (CC). 
The ideas behind the CC-framework are that one first de- 
fines a set of characteristics of interest, and then chooses 
appropriate network measures. Next, one implements a 
standardisation procedure involving a non-linear trans- 
formation and statistical normalisation. Relying on sta- 
tistical methods, uniformity and variability are accounted 
for. Standardised measures can be combined using invari- 
ant inheritance schemes to form new standardised mea- 
sures carrying abstract physical meanings. Once back- 
transformed, these are called composite centrality. It turns 
out that final CC-scores for different set-ups and differ- 
ent networks are well-approximated by the standard log- 
normal distribution, which presents an universal scale to 
compare scores for different set-ups and even different 



1 



In Section II 



we 



networks (comparability). 
This paper is structured as follows, 
give a short introduction to the relevant terminology from 
graph theory and a recipe for graph measure standardis- 
ation. In |Section fTTj we present the CC-framework and 



propose a specific standard framework. We demonstrate 
the working of the proposed set-up by considering the 
two cases of the world trade web and the world migration 
web, both during a time span of 40 years. Furthermore, 
a graphical tool, which we call the network genetic fin- 
gerprint is introduced. It allows for efficient analysis and 
monitoring of composite centrality measures. In Section| 
IV we discuss the validity and limitations of the proposed 
framework by the means of large-scale simulations. We 



finally conclude the report in Section V 



2 Preliminaries 

2.1 Graph Theory 

In this section, we give a short introduction to the parts of 
graph theory which are needed in the following. Explicit 
formulas will be given if deemed necessary. For a more 
detailed introduction we refer to 131[5]|6lG). 
Graph theory provides a general mathematical frame- 
work to represent and quantify complex networks and 
their properties. A weighted and directed network can 
be represented by a graph D = (V,E), where V = 
{vi, . . . , v;v} is the set of N > 2 nodes (vertices) in the 
graph. E (w, ; - > 0| i, j G { 1 , . . . , N}) is the set of weighted 
edges from node v,- to node vj, with N e = ord{E) denot- 
ing the number of edges irrespective of their weights. The 
whole graph can be represented by a real weight matrix, 
W = [w u ] G R NxN (Wij / Wj h in general). We do not al- 
low for self-loops here, i.e. wu = for all i G { 1 , . . . , N}. 
The I th row or column represents the out- or in-strength 
distribution of node i, respectively. The total strength of 
a node i, denoted by Si, is the sum of in-strengths sf and 
out-strengths sf ut of that node. It represents a generalisa- 
tion of the degree centrality (number of adjacent edges) 
in an undirected and unweighted graph. The degree of a 
node can be obtained from the underlying simple graph 
(non-weighted, non-directed, no self-loops) through its 
adjacency matrix A = [atj] = [ajj] G {0, 1}, where a\j = 1 
if there is an edge between node i and node j, but a,y = 
otherwise. Likewise, since self-loops are not allowed, one 
has an = for all i G { 1 , . . . , N}. The degree of node i is 
given by the / th row/column sum of A. Strength and de- 
gree of a node can be interpreted as two measures for local 
connectivity characterising quantity and quality, respec- 
tively. It is said that there is a connection between any two 
nodes i and j in D if there exists a directed path pu from i 
to j (ptj ^ pp, in general). A directed graph is said to be 
strongly-connected if there exists a directed path between 
any two nodes. This means that the weight matrix W and 
the adjacency matrix A are both is irreducible. A measure 



for the connectivity of a graph on the global scale is the 
edge density p e = N e / [N 2 —N) (number of actual edges 
divided by number of possible edges), while on a local 
scale the connectivity may be expressed via the average 
clustering coefficient. On the adjacency level, the clus- 
tering coefficient of a node is defined as the number of 
actual connections within its neighbours over the number 
of possible connections among them. A further important 
measure to quantify the overall connectivity of a node i 
is the average shortest path length = (/,;)•, i.e. the av- 
erage number of steps (over unweighted edges) which it 
takes to get from node i to any other node j (a generali- 
sation to weighted edges is straightforward). Note that in 
a directed graph the shortest path between two nodes is 
generally not symmetric, i.e. Z,y ^ Ijj. The diameter of 
a graph is defined as the maximal shortest path between 
two nodes. The maximal flow fij between two nodes i and 
j is the maximal capacity that can be transported paral- 
lelly from node i to node j via the whole graph, where 
edge capacity can be described by edge weight (4[ (again 
fij + fji, in general). 

Graph asymmetry is a measure for the difference between 
Wij and Wji on a global scale, i.e. for the overall weight 
balance. We define it as 
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where || • \\p denotes the Frobenius norm of a ma- 
trix. The algebraic connectivity X\ is the second 
smallest eigenvalue of the normalised Laplacian 
Lfj = D 2 (D — W)D 2 , where D = diag(s\, . . . , sn) 
is a diagonal matrix consisting of the nodes' strengths. 
As its name suggests, it determines the robustness of a 
graph (4j [TH. A further measure is assortativity SHI, 
As G [—1,1], which describes the overall homogeneity 
of connections indicating if weak/strong nodes are 
preferentially coupled to other weak/strong nodes (or 
vice versa), resulting in a positive (or negative) As value. 
Eigenvector centrality of a node i is defined as the i tb 
entry of the eigenvector corresponding to the largest 
eigenvalue of the underlying graph's adjacency matrix. 
It measures how well a node is connected to the whole 
graph or to other well-connected (high-scoring) nodes [4 j. 



2.2 Measure Standardisation 

Different node measures (e.g. centrality measures) gen- 
erally span large and different numerical ranges, which 
makes them difficult to compare^] We address this 
point by defining the following normalisation procedure 
for positive- valued measures and strongly-connected net- 
works: 



'Here and throughout, we only consider node measures. A gener- 
alisation to edge measures is straightforward. 



2 



1. Every measure is made odd, i.e. the smaller the value 
the higher the score. As an example, consider av- 
erage shortest path length (ASPL) and total degree 
centrality. While ASPL is already an odd measure 
(i.e. a smaller numerical value stands for a better 
connectivity of a node), degree centrality is what we 
call a straight measure (i.e. the bigger the numerical 
value, the higher the score). The easiest way to make 
straight measures odd (or vice versa) is to invert nu- 
merical values. 

2. Every measure is normalised to one, i.e. the sum 
over all values of all nodes gives unity. Conse- 
quently, every value lies between zero and one, and 
the mean for all measures equals l/N. 

3. The logarithm is taken of every value for all mea- 
sures. This is a non-linear transformation which, in 
general, brings numerical values to the same order 
of magnitude. To recover the common mean and to 
make the results independent of the base of the log- 
arithm, we renormalise all measures to onJ^\ 

4. The variation of every measure about the common 
mean 1 /N is considered, i.e. 1 /N is subtracted from 
all values. Then, negative values represent a below- 
average score for the transformed measures. 

5. Every measure is statistically normalised, i.e. nu- 
merical values are expressed in terms of the sample 
standard deviation a s of the corresponding measure. 
Assuming that one has a representative data set (see 



Section IV I, this renders numerical values indepen- 



dent of the network size, while changes in a s may 
imply changes in the topology of the network, i.e. 
evolution over time. 

Measures that have been standardised have the same zero 
mean and show the same level of variation in terms of a 
unit standard deviation. Furthermore, they approximately 
span the same range of numerical values. Following the 
above recipe from the bottom to the top, one comes back 
to the original measures. In contrast to the original mea- 
sures the scores for standardised measures and their tem- 
poral evolution can be directly compared to each other. 

3 The CC-Framework 
3.1 Composite Centrality 

By only using tandardised measures (SM), it is straight- 
forward to construct arbitrary composite measures. Hav- 
ing two node measures, A and B, where A and B denote 
the corresponding SM, the composite centrality measure 



is defined as 



C C omp(A,5) = exp 



A + B 
o s (A+B) 



exp [C] , (2) 



where a s (-) stands for the sample standard deviation. In 
the second step, the quantity inside the square brackets 
C is defined as a new SM carrying an abstract physical 
meaning depending on the original measures. Having a 
set of n measures, M B Mj, i G { 1 ,...,«}, a generalisation 
of (|2]) takes the form 



Qomp (M) = exp 
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where we have defined the composite SM M comp . It is 
independent of the order the M, are combined, up to 
negligible statistical fluctuations. We may derive one 
further result from the form of C comp . Provided that 
the assumptions of the Lyapunov theorem |[T9l hold^] 
the central limit theorem from statistics (CLT) lfl9l 1201 
can be extended to include non-identically distributed 
random variables. We may then state that for a set of n 
independent random variables, which are represented by 
the SM Mj, the sampled random variable M comp tends to 
a standard normal distribution with zero mean and unit 
variance as the sample size (i.e. the number of measures) 
increases. Consequently, C comp is expected to (approxi- 
mately) follow a standard log-normal distribution. This 
sets an unified scale for composite centrality measures, 
because the limiting distribution is parameter-free, thus 
rendering node centrality scores comparable over time 
and for different networks. 



3.2 A Standard Framework: Direction, Range 
and Structure 

The formulation of the CC-framework has so far been 
general. In this section, we specify a certain set-up, 
which we propose as a standard framework (SF) for the 
investigation of general weighted and directed networks. 
We demonstrate the working of the SF by analysing 
data from the world trade web (WTW) [21] and the 
world migration web (WMW) |[22l . which can both be 
characterised as social networks. In both networks, nodes 
are countries and territories. In the WTW, edges are 
directed trade flows valued in USD reported during one 
calendar year, while in the WMW, edges are directed 
migrant flows reported since the last census. Data have 
been raised every year and every ten years, respectively. 
We consider observation periods of 40 years sliced into 
10-year intervals for both network^) WTW: 1970-2010, 
WMW: 1960-2000. Despite the fact that both networks 
are very different in nature and structure, it turns out 



2 Taking absolute values or dividing all values by 
lead to equivalent results. 
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Section IV 



This may not always be the case; see 

4 Basically, the maximal time span for which data are available. 
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that the SF is well-suited to analyse both networks, 
considering their properties as well as their temporal 
evolution. 

In the SF, we are interested in a node's centrality based on 
the criteria direction (in- and out-bounded connectivity, 
D: IN/OUT), range (long- and short-range connectivity, 
R: LO/SH) and structure (qualitative and quantitative 
connectivity, S: QL/QN). To achieve a high centrality 
score, a node must score well regarding all criteria. 
Since all three criteria are binarily divided, we need a 
total of 2 3 = 8 measures to characterise them. Using a 
specification of the form D-R-S, a possible choice of 



WTW : e th = 10 7 USD 



measures Msf is given in Tab. I The corresponding SM 
carry a bar and their averages, a hat. 



D-R-S 


description 


symbol 


IN-LO-QL 


in - coming AS PL 




IN-LO-QN 


in - coming max. flow 


fin 


IN-SH-QL 


in - degree 


din 


IN-SH-QN 


in - strength 


Sin 


OUT-LO-QL 


out - going ASPL 


fcut 


OUT-LO-QN 


out - going max. flow 


/out 


OUT-SH-QL 


out - degree 


^out 


OUT-SH-QN 


out - strength 


^out 



Table 1: SF measures based on the properties direction 
(D), range (R) and structure (S). 

In both cases of the WTW and the WMW, we consider 
the largest strongly-connected component of a thresh- 
old graph (LSCTG), i.e. edges with a strength below a 
certain threshold e t h are removed. This has the advan- 
tage to reduce the relative errors in the data, and fo- 
cuses the analysis on quantities of a certain minimal 
magnitude. The corresponding edge thresholds are set 



year 


1970 


1980 


1990 


2000 


2010 


N 


136 


154 


166 


197 


204 


N e 


2078 


4075 


4632 


6941 


9667 





4 


3 


3 


3 


3 


I 


2.02 


1.87 


1.87 


1.89 


1.81 


//io 9 


0.39 


2.06 


2.68 


4.19 


10.13 


d 


15 


26 


28 


35 


47 


f/10 8 


0.15 


0.76 


1.14 


1.57 


3.50 


WMW 


eth = 


2- 10 3 mi 


grants 




year 


1960 


1970 


1980 


1990 


2000 


N 


163 


171 


173 


175 


193 


N e 


1436 


1664 


1924 


2377 


2880 





4 


5 


3 


4 


4 


I 


2.75 


2.72 


2.52 


2.43 


2.41 


//10 5 


0.67 


0.81 


1.00 


1.38 


1.56 


d 


9 


10 


11 


14 


15 


s 


3423 


3535 


3880 


4513 


4397 



Table 2: Numeric values for selected standard measures 
for WTW (upper table) and WMW (lower table): number 
of nodes N, number of edges N e , diameter 0, ASPL I, 
average max. flow per other node /, average degree d, 
average strength per other node s. "per other node" means 
that those quantities have been divided by N — I, which 
gives a better understanding of QN-measures. 



Having defined the set-up in terms of Msf, it is straight- 



forward to standardise them using the recipe given in Sec- 
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tion II-B As pointed out there, the distribution of M 



comp 



th 



10 7 USD (without inflation adjustment) and should be (approximately) standard normal with mean 

H = and standard deviation a = 1. Hence, C comp should 
(approximately) follow a standard log-normal distribution 
with probability density ll23l 



2000 migrants, respectively. General properties 
of both networks and their evolution over time are shown 
in 



Tab. II and in Fig. 1 (upper panels). One can see that 
both evolving networks grow over time and show an in- 
crease in edge density (which is less pronounced in the 
WMW) and asymmetry (which is more pronounced in the 
WMW), as well as negative assortativity (atypical for so- 
cial networks [5]). The differences between the two net- 
works can be characterised as follows: The WTW can be 
described as homogeneous (high density and clustering, 
modest asymmetry, high algebraic connectivity), while 
the WMW can be characterised as heterogeneous (low 
density, modest clustering, high asymmetry, low algebraic 
connectivity). 



PlogN I 



1 



'2k 



exp 



In* 

~1~ 



(4) 



where x is a random variable. This represents a universal 
centrality distribution in the sense that there is no free pa- 
rameter involved, providing a unique scale for composite 
centralities across different networks. 
To demonstrate the applicability of the SF, we compute 
M c s D F mp and C c s F mp for the WTW and the WMW, and in- 
vestigate their evolution during a period of 40 years. In 
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Figure 1: (colour online) Upper row: Normalised mea- 
sures: coverage £ (fraction of edge data included in 
LSCTG), clustering coefficient of the underlying simple 
graph CI, algebraic connectivity X\ , graph asymmetry Ad, 
edge density p e , assortativity As. Bottom row: Evolution 
of C comp for selected countries (see text); WTW (left) and 
WMW (right). Country codes are given using the ISO 
3166-1 2-digit standard. 



Section IV this analysis is extended to larger synthetic 
data sets. 

The results showing the relation between C^ mp and the 
log-normal distribution Q are givven in Fig. 2 where 
data points from all years have been combined for better 
representatiorj^] The inner panels show an MLE fit of a 
normal distribution (green line) to the M^ mp distribution, 
whereby the frequency distributions (blue bars) are well 
approximated by the standard Gaussian. The outer panels 
show the cumulative distribution function (CDF) of the 
resulting standard log-normal distribution (red line). One 
can see that, in both cases, this resulting standard distribu- 
tion fits the data (unfilled circles) well. To quantitatively 
test the log-normal hypothesis, we perform a goodness- 
of-fit (GoF) analysis by calculating /^-values to e < 0.01 
precision using the Kolmogorov-Smirnov test (KS-test^] 
|[25l l26l : The KS-test compares the maximal distance 
(KS-statistic) between the empirical CDF and the hypo- 
thetical distribution's CDF to the corresponding distance 
for a set of synthetic samples. The /rvalue is the fraction 
of those samples where the empirical CDF is closer to the 
hypothetical CDF. 

Next, we need to define a decision rule (DR) for the ac- 
ceptance or rejection of the log-normal hypothesis. Fol- 
lowing 11251 , the log-normal hypothesis is accepted for a 
/7-value of p > 0.1. 

Tab. 3 shows that the log-normal hypothesis for the dis- 
tribution of composite node centralities using SF can be 
accepted during the whole observation period, where we 

5 This is possible since they are all expected to follow the same 
distribution. 

6 Since the resulting M^ mv is expected to be approximately normal- 
distributed, we find the Shaipiro-Wilk test for normality 1241 too strin- 
gent. 



std. log-normal distribution of C^ mll (WTW, all times) 




(a) WTW 



std. log-normal distribution of C^ mv (WMW, all times) 




q^ p = exp [AC,,,, 

(b) WMW 

Figure 2: (colour online) Gaussian fit to M C omp distribu- 
tion using maximal likelihood estimators (MLE): Stan- 
dard Gaussian (data and fit, inner panel). Cumulative dis- 
tribution function (CDF) of resulting standard log-normal 
distribution of C comp (data and fit, outer panel): (a) WTW, 
(b) WMW. The data of both networks have been com- 
bined for the corresponding 40-year observation period. 



also considered the influence of different edge thresholds 
and alternative sets of node measures (see Section IV-B). 
We conclude that it is indeed possible to describe node 
centralities and their evolution over time using the univer- 
sal distribution Q, with mean yfe and median 1. This 
result provides a powerful tool for the uniform investiga- 
tion of complex evolving networks. 
To illustrate the evolution of C^ mp and to demonstrate 
what one can learn from a composite centrality analy- 
sis, we consider |Fig. 1| (lower panels): The evolution 



of C^ mp for a group of eight selected countries in the 
WTW (left) and the WMW (right) is shown, respectively. 
The four countries, United States (US), Germany (DE), 
Great Britain (GB) and Japan (JP), represent industri- 
alised economies, while Brazil (BR), Russia (RU), India 
(IN) and China (CH) form the BRIC-block of developing 
economies. 

Let us first consider the WTW. One can clearly see how 
the two groups converge after 1990. This convergence 
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WTW 



year 


N 


e lh /W b 


PSF 


Palt 




146 


5 


.66 


.23 


1970 


136 


10 


.72 


.31 




113 


20 


.44 


.51 




166 


5 


.88 


.58 


1980 


154 


10 


.84 


.55 




142 


20 


.75 


.42 




172 


5 


.64 


.59 


1990 


166 


10 


.71 


.67 




159 


20 


.70 


.61 




208 


5 


.88 


.85 


2000 


197 


10 


.87 


.84 




1 on 

18 / 


2U 


.93 


.11 




214 


5 


.53 


.20 


2010 


204 


10 


.71 


.45 




195 


20 


.89 


.64 






WMW 






year 


N 


Wio 3 


PSF 


Palt 




178 


i 


.96 


.91 


1960 


163 


2 


.90 


.96 




151 


3 


.88 


.85 




180 


1 


.62 


.97 


1970 


171 


2 


.70 


.96 




158 


3 


.70 


1.0 




196 


1 


.34 


.87 


1980 


173 


2 


.58 


.95 




167 


3 


.66 


.91 




198 


1 


.32 


.70 


1990 


175 


2 


.51 


.97 




166 


3 


.78 


.81 




207 


1 


.20 


.52 


2000 


193 


2 


.51 


.81 




182 


3 


.62 


.85 



Table 3: Results for the goodness-of-fit (GoF) analysis of 
the log-normal hypothesis for the distribution of C com p: 
WTW (upper table) and WMW (lower table), /^-values 
from the KS-test for different edge threshold e m and Gl- 
measure sets Msf and M a i t are shown. 



can essentially be interpreted as an illustration of global- 
isation, where the rise-and-fall of a score does not nec- 
essarily reflect an absolute change of some measure(s), 
but rather, how the score changed in comparison to all 
other nodes. If we consider a network as a closed sys- 
tem, this may be all that matters. Furthermore, one may 
(carefully) extrapolate how these scores evolve into the 
future, what might be of great importance considering 
global economic development and for tackling the chal- 
lenges which might in turn arise. 

A similar, but less pronounced behaviour is observed in 
the WMW. Considering the overlapping time window for 
both networks (1970-2000), one can see that the evolu- 



tion in one network is not necessarily reflected in the 
other. Such comparisons are possible, since node cen- 
tralities in both networks are measured using the same 
scale. Since both networks share a considerable frac- 
tion of nodes, this universality allows us to draw conclu- 
sions comparing node centralities in the WTW and the 
WMW. E.g., the US is (still) the most central node to 
both networks. Regarding its much higher CC-value in 
the WMW, one concludes that the US is much more cen- 
tral to the WMW than it is to the WTW, from which, taken 
by itself, a multitude of implications and further questions 
may arise. In addition, the large economic development 
of China after 1990, which is well represented by the evo- 
lution of its CC-score in the WTW, is what we might call 
a truly Great Leap Forward. This economic development 
is (so far) only weakly reflected in the WMW. An interest- 
ing question for future research is how a node's centrality 
in one network is related to its centrality another network; 
and, if there are significant relations between centralities 
in different networks, how these relations come about. To 
answer such questions consistently, i.e. to be ably to do a 
thorough and effective analysis of such relations, we will 
present a novel graphical tool for the evaluation of the 
evolution of composite centrality scores in the next sec- 
Itionl 

The above given examples demonstrate clearly how C^ mp 
provides a powerful tool to evaluate and compare node 
centralities for different evolving networks. One future 
task will be to test the applicability of the proposed frame- 
work to a larger variety of complex networks, or even very 
general complex systems of different topologies. This 
point will be addressed in Section IV-B. 

3.3 The Network Genetic Fingerprint 

Given a set of measures, M, the form of (|2]) and ([3]) al- 
lows for the possibility to derive a variety of intermediate 
SM while computing M comp , whereby the final value of 
M comp is independent of this ordering. A certain (invari- 
ant) ordering is called an inheritance scheme. To make 
this point clear, we consider Fig. 3| where we start with 
the 2 3 = 8 measures from Msf in the first generation (Gl). 
In each subsequent step, two SM are combined according 
to ([2]), generating sets of abstract higher generation mea- 
sures, (G2, G3, G4), while the single measure in the forth 
generation (G4) equals Mcomp f° r a U schemes. 
The D-R-S inheritance scheme for Msf of 



Fig. 3 



is defined as follows: Firstly, we combine S-measures 
(QL+QN having the same D-R-specification) to gener- 
ate the second generation measures (G2) characterising 
abstract D-R-scores. Secondly, we combine R-measures 
(SH+LO having the same D-specification). This results in 
measures characterising the in- and out-bounded central- 
ity of a node, while each score contains now information 
originating from four single measures. Finally, we com- 
bine the remaining two D-measures leading to M^ mp . In 
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Gl G2 

D-R-S D-R 



G3 

D 



G4 



Figure 3: Possible D-R-S inheritance scheme (see Tab. 
[TJ: First generation measures (Gl) are successively com- 
bined to higher-generation measures (G2, G3, G4) us- 
ing ([2]), whereby intermediate measures provide addi- 
tional information. M com p> an d hence C comp , are scheme- 
independent. 



this way, one arrives at M^ mp iteratively using (|2), while 



different schemes lead to the same numerical va 



ues for 



.Mcomp and C^ mp , up to minor statistical fluctuations. For 
the WTW and the WMW, these fluctuations of M c s F mp are 



?WTW 



10 



and 



?WMW , 



10 ), respectively. 



of order 

While Gl- and G4-measures are the same in the case 
at hand, G2- and G3- measures of different inheritance 
schemes provide additional and complementary informa- 
tion regarding the role of individual nodes in a network 
and their evolution over time. 

In this section, we present a graphical tool which en- 
ables one to easily monitor and analyse individual con- 
tributions to a node's composite centrality and their evo- 
lution over time, given a certain scheme. Fig. 4 shows 



lower generation measures: Additional information is 
gained/conserved. Second, taking specifically into ac- 
count that China has been the most populous country dur- 
ing the whole observation period ETTl . and assuming that 
the data of migration were at least weakly correlated with 
the population sizes, China had a dramatically low cen- 
trality in the WMW relative to its size. This becomes 
clearer when looking at its abstract IN- and OUT-scores. 
The situation changed after 1990, just like in the WTW; 
however, this change is more contained in the WMW. 
One possible usage of the NGFP is to analyse in detail 
a node's (or even a group of nodes') centrality to a net- 
work monitoring several scores, including abstract inter- 
mediate measures and their evolution over time. Note that 
a measure's score is always reflected by one or several 
physical real-world entities. After identifying such real- 
word contributions to a network measure, the NGFP can 
be used to quantify the influence of these contributions to 
a node's centrality. Considering the the two cases at hand, 
such real-world contributions may be given by economic 
and immigration policies and their consequences. Given a 
particular single or composite score, this information can 
then be of great use when, for example, directing futures 
decisions. 

Furthermore, the universality of SM allows to compare 
scores for different networks and investigate their rela- 
tions as it might be done for the WTW and the WMW. 
The NGFP offers a useful tool to do so inside a uniform 
framework, which demonstrates once more the normative 
power of the CC-framework. 



D-R-S NGFP: China (WMW) 



what we call the network genetic fingerprint (NGFP) for 
China in the WMW using the D-R-S scheme. For each 
year, the individual bar stands for G4-, G3-, G2- and Gl- 
SM from left to right, respectively. The relative height of 
each coloured region stands for the corresponding SM's 
magnitude taken from the zero-line (expectation), given 
m (7 C o m p. 

We remark that, for a Gl -measure, this zero line, when 
transformed back to the original graph measure, is in 
many cases closer to the median of that measure than to 
its mean (especially, if its original distribution is heavy- 
tailed, e.g. power-law-like). This is due to the non- 
linearity of the logarithm used in the measure standard- 
isation procedure. 

Total heights (positive/negative) on the right (Gl, G2, G3) 
sum up to heights on the left (G2, G3, G4), respectively. 
That is, for a given year, subtracting the negative part from 
the positive part is the same for each column (generation). 
Looking at the particular case at hand, we can im- 
mediately draw two conclusions: First, higher genera- 
tion scores may result from destructive interference of 



t i 



f 



I960 1970 1980 1990 2000 

year 

Figure 4: (colour online) Network genetic fingerprint 
(NGFP) for China in the WMW for the years 1960-2000. 
For each year, from left to right, the relative bar heights 
stand for the G4-, G3-, G2- and Gl -measures as described 

Total heights 



by the inheritance scheme shown in Fig. 3 



(positive/negative) on the right (Gl, G2, G3) sum up to 
heights on the left (G2, G3, G4), respectively. 
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4 Discussions 
4.1 Validity 



In |Section II-B[ we mentioned that, given a representative 
data set, we can perform a statistical measure normalisa- 
tion in terms of the sample standard deviation o s from a 
common mean. By representative we mean that a s is in- 
dependent of the sample size. This is important since we 
want to express our results independently of the network 
size and only in terms of statistical deviations. Looking 
at the change of a s for all SM and all generations, it is 
clear that the correlation with the network size is weak 
as compared to correlations among different measures. 
From this, we conclude that the assumption of a repre- 
sentative data set is justified for both, the WTW and the 
WMW, while changes in the o s are associated with under- 
lying topological changes (evolution). 



In Section III-B we argued furthermore that, if the as- 
sumptions of the Lyapunov theorem |[T9l |20l hold, the 
CLT can be extended to non-identically distributed ran- 
dom variables. These assumptions are: 

• Independence: The M; are independent of each 
other. 

• Lyapunov condition: Mi are a sequence of, not nec- 
essarily identical, non-degenerate random variables 
(measures), its partial sum S n = £" =1 M, and its vari- 
ance V„ = £" =1 of = Var(S n ), and there exists a 



8 > such that 



lim 

n— 



Z n k E(\M, 



\2+8\ 



v,; 



1+5/2 



o. 



(5) 



i.e. the 2 + 8 -moment exists. 



Generally, it cannot be assured that the 2 + 5 -moment 
exists. Furthermore, we neither specify nor investigate 
the particular distribution of individual measures. Be- 
cause of this inherent uncertainty, we choose the way 
of backward-engineering: Presuming that the necessary 
assumptions approximately hold, we use an a posteriori 
hypothesis test, for each case at hand, to decide if the 
resulting universal centrality distribution is an acceptable 
description of the observed data. This is exactly what we 
have done by considering the outcomes of a KS-test in 
combination with a rather strict decision rule, whereby 
the resulting GoF is a measure for modelling accuracy 



in the particular case (see Section III-B ). In fact, many 
low-generation SM considered in the SF show already 
an approximate normal distribution^} which indicates the 
existence of higher moments. 

The requirement of independence is generally hard to 
fulfil because many network measures are somehow and 



7 It has been observed that variations of logarithmic measures about 
their means frequently show normal behaviour for a wide range of 
systems [23]. 



somewhat dependent, and thus correlated. By imposing 
an edge threshold in the cases at hand, the strength of a 
node is automatically coupled to its degred^ Looking at 
the inner panels of Fig. 2\ such correlations are seen to 
be responsible for slightly tilting the normal frequency 
distributions to the left and the right for the WTW and the 
WMW, respectively. Nevertheless, we might assume that 
those correlations are rather weak, such that one can talk 
about quasi-independence, which is ultimately justified 
by the applicability of the approach. 



4.2 Limitations 

Considering the finiteness of Msf together with the uncer- 
tainties about the validity of the underlying assumptions 
(see the previous section) ), the log-normal distribution 
will surely not always be an acceptable description for the 
distribution of C comp (hypothesis). Therefore, we define 
the decision rule (DR) p > 0.1 (GoF from a KS-test, see 



Section III-B I whether to accept or reject the hypothesis. 
The log-normal distribution of C comp results from the 
interference between the single SM induced by the 
sampling leading to M comp . Thus, there are four major 
factors influencing the distribution of C comp : The number 
of measures n, the choice of measures M, the edge 
threshold e t h and the number of nodes N: 



n : It is expected that the distribution of C com p con- 
vergences to the standard log-normal with an in- 
creasing sample size, hence n. This is indeed the 
case. Nearly none of the Gl -measures passes DR for 
both, the WTW and the WMW. Starting from G2, 
nearly all measures pass DR for the WTW, and all 
IN-measures do so for the WMW. This difference 
is attributed to the homogeneous and heterogeneous 
structures of the WTW and the WMW, respectively 



(as described in Section III-B ). 



M : One expects that the GoF changes with the 
composition of M, since different measures gener- 
ally follow different distributions and take different 
numerical values and ranges. This is indeed the 
case. Given Msf, in order to improve the GoF, one 
might replace the Gl -measure which generates the 
lowest single p- value by a measure which generates 
a higher value to improve the GoF. A single measure 
which generates a relatively high individual /rvalue 
is eigenvector centrality, considering the WTW and 
the WMW We thus replace / in // out Qand f in /f out ^\ 



8 The same is true for the maximal flow. 

9 WTW: The relative low diameter lets / peak strongly at a certain 
value, causing large deviations from a normal distribution. 

10 WMW: The low algebraic connectivity X\ suggests that there are 
a lot of "bottlenecks" in the network, causing large deviations of the / 
distribution from a normal one. 
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with right/left eigenvector centrality, respectively. 
The results for the GoF usi ng this a lternative mea- 
sure sets M a i{^] is shown in Tab. Ill where we also 
considered different edge thresholds (see below). 
The generally different reactions of the GoF for the 
WTW and the WMW to the use of M alt instead of 
Msf illustrates what might ultimately be attributed 
to the complexity of the underlying systems: One 
sees that just using "well-behaved" Gl-measures 
does not guarantee a higher GoF. 



e t h- A similar non-uniform behaviour of the GoF for 
the WTW and the WMW is observed when looking 
at different edge thresholds. While for the WTW the 
GoF does not show a general relationship with the 
edge threshold, it generally increases with the edge 



threshold for the WMW after 1960 (see |Tab. III| ). 
Furthermore, the effects of edge threshold variations 
are qualitatively the same for the usage of M a \ t in- 
stead of Msf for both networks. 

N : See below. 



One sees that the log-normal hypothesis for the distribu- 
tion of C comp Q can generally be accepted for the discrip- 
tion of node centralities in the WTW and WMW using 
the standard set Msf- This strongly undermines the appli- 
cability of the proposed CC-framework for the universal 
description of node centralities across different networks. 
Both, the WTW as well as the WMW, are relatively small 
networks by their number of nodes comparing to other 
real-world networks, as for example the internet at any 
level HIH. On the one hand, we are interested in how 
a composite centrality score behaves for different sample 
sizes (number of nodes or edges) and, on the other hand, 
we want to investigate the universality claim of the pro- 
posed framework to deal with arbitrary sets of measures, 
not necessarily originating from complex networks. 
To achieve this, we consider a set M ar b of eight "mea- 
sures" whose values are drawn from eight different con- 
tinuous and discrete probability distributions stretching 
over different numerical ranges, where any particular 
choice can be arbitrary. The distributions and the val- 



ues of their generic parameters are given in Tab. IV 



Note that all measures fulfil the requirements of the Lya- 



punov theorem (see the previous section I. For sample size 
N = 10 2 — 2 • 10 4 , we standardise these measures, com- 
pute the corresponding composite centrality scores C^ mp 
and test the log-normal hypothesis calculating the GoF 
with the KS-tes{^] The results for the p- value (GoF) are 
shown in Fig. 5 1 (inner panel). It shows a degrading of the 
GoF as the sample size ,/V increases at the 95 % confidence 



Note that the physical interpretation in terms of the D-R-S classi- 
fication is now partly lost. 

12 The KS-test is limited to A' = 10 4 due to computational limita- 
tions. 



level. This decrease in the GoF is expected and can be ex- 
plained by the law of large numbers (LLN) from statistics 
GUI [191 , which, in our case, can be written as 



P I lim max II CDF! 



■distribution 
std. logN 



CD ^std.logN II - ° ] - !• 

(6) 

It states that the KS -statistic for synthetic samples drawn 
from Q converges to zero as the sample size increases 
(see |Section III-B[ >. On the other hand, the KS-statistic for 
data generated by sampling finitely many RV is expected 
to show a finite off-set (error) for increasing sample sizes, 
which illustrates the limits of our approximation for the 
description of C com p using the standard log-normal ([4]). 



distribution 


parameters 


uniform 


inf — , JCmax — 1 


normal 


^ = io 5 , o = 10 3 


log-normal 


11=0,0 = 10 


exponential 


n = 10 -3 


power-law 


■^"min — 1 5 — 3 • 5 


Pareto 


■^min — 10 . CC — 3.5 


Poisson 


At = 20 


binomial 


n=\N\,p = 0A 



Table 4: Probability distributions contributing to M ar b. In 
the right column, the numerical values for generic param- 
eters are given which have been used to generate samples 
of size N = 10 2 - 2 • 10 4 . Discrete measures have been 
shifted by one to guarantee positive values. 



However, for large samples, this off-set from the log- 
normal CDF is just the maximal error when estimating the 
upper or lower bound probability for a certain CC-value 
of a random chosen node or edgeJ^J Thus, the universal 
scale Q can be used for an effective description of the 
distribution of composite centrality measures, taking into 
account a finite, but supposedly small, error. This error 
depends on the particular set-up, like the number of mea- 
sures and their distributions. Consequently, for a reliable 
and scalable error estimation, these distributions should 
be known at least approximately. 



Fig. 5 (outer panel) shows such an error estimation for 
M ar b and increasing sample size at the 95% confidence 
level. As expected, the KS-statistic of synthetic sample 
undercuts the KS-statistic of M ar b at a certain point, which 
is reflected in a decrease of the GoF. On the other hand, 
the KS-statistic of M ar b levels off with increasing sample 



The maximal error for an interval is twice that value 



9 



KS-statistic at 95" , CL (n = 8, 100 runs) 




sample size Win 10 X 

Figure 5: (colour online) Inner panel: GoF of the log- 
normal hypothesis using the measure set M ar t,. The deci- 
sion rule (DR) is given by the red horizontal line. Outer 
panel: Evolution of the KS-statistic to the standard log- 
normal distribution Q for samples drawn from C^ mp 
(solid line) and the standard log-normal distribution itself 
(dashed line). Both panels show the evolution of aver- 
age values with respect to the sample size N including the 
95 % confidence intervals (coloured areas). 



size. The upper bound of the corresponding confidence 
interval is then interpreted as the maximal error when esti- 
mating the upper or lower bound probability for a certain 
CC-value at the given confidence level. Let us presume 



that the results given in Fig. 5 stem from a network, e.g. at 
different time instances, and that M ar b represents some set 
of node measures. Considering an instance with approxi- 
mately N = 10 4 nodes, one can then use the outer panel of 



Fig. 5 to tell that a randomly chosen node will (or will not) 
have a composite centrality smaller or larger than a spec- 
ified value, say the distribution's median 1, with a certain 
probability subject to an error smaller than 1.6% at the 
95 % confidence level. The maximal error is most likely 



to be attained in the middle part of the CDF (see Fig. 2i. 
The real error for the extremes of the CC-spectrum, which 
might be of greater interest, can be assumed to be much 
smaller than that. This demonstrates clearly how the CC- 
framework can be applied, even in cases with low GoF for 
the log-normal hypothesis, providing one with extraordi- 
nary predictive power for the description of general com- 
plex systems. 

5 Conclusions 



Motivated by observing the lack of a unified framework 
to describe evolving complex systems in general and com- 
plex networks in particular, we have presented the concept 
of composite centrality C comp (CC-framework) to evaluate 
node and edge centralities based on a set of several graph 
measures. A standardisation procedure based on a non- 
linear transformation and statistical normalisation allows 
for the uniform comparison of different graph measures 



and their combination, even for different networks. Based 
on a generalisation of the central limit theorem (GCLT), 
it is possible to derive a unified and parameter-free distri- 
bution function for the approximate description of C com p! 
which is given by the standard log-normal distribution Q. 
The implementation of invariant statistical inheritance 
schemes leading from an initial set of measures to C com p 
allows for the deduction of a variety of abstract centrality 
measures, which provide additional information. This is 
well depicted by the graphical analysis tool, which we la- 
bel the network genetic fingerprint (NGFP). As has been 
shown, the NGFP can be used when analysing a node's 
centrality and its evolution over time considering multi- 
ple measures at a time. 

To test the proposed framework and its implications, we 
have considered the world trade web (WTW) and the 
world migration web (WMW), both on the weighted and 
directed level. In both cases, we have concentrated on 
node measures considering the largest strongly-connected 
component of a threshold graph. We have presented a 
standard framework (SF), which we propose as a future 
paradigm for the analysis of general evolving complex 
networks, based on measures quantifying a node's cen- 
trality according to the criteria direction, range and struc- 
ture. We have demonstrated how C^ mp describes the evo- 
lution of node centralities and how it allows to compare 
scores for different networks using the unified scale given 
by the standard log-normal distribution Q. 
Since it is not possible to show a priori that the assump- 
tions of the GCLT hold for general cases, we have used 
an a posteriori hypothesis test calculating a goodness-of- 
fit parameter to decide in which situations the unified de- 
scription of node centralities through C CO mp is appropri- 
ate. It turned out that the standard log-normal hypothesis 
is generally accepted for a wide range of measure sets and 
network thresholds, making it a powerful tool for the anal- 
ysis of evolving complex networks. 
To extend the CC-framework to general measure sets, not 
necessarily originating from complex networks, and to in- 
vestigate the limitations of the log-normal hypothesis, we 
considered a set of measures drawn from eight different 
distributions for varying sample sizes. We observed a 
decreasing of the goodness-of-fit with increasing sample 
sizes, which can be explained by the law of large numbers 
from statistics. 

However, taking into account the relatively small error 
made by our approximation, the standard log-normal dis- 
tribution Q can still be used for the effective description 
of general composite centrality scores. 
This universality is what may be called a natural scale 
for complex networks. As we have shown, the proposed 
framework is not only limited to the study of evolving 
complex networks, but may also be used for general com- 
plex systems. 
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